scsi/block: NUMA-local scan allocations, shared-tag path cleanup, and SCSI I/O counters#752
Open
blktests-ci[bot] wants to merge 3 commits intolinus-master_basefrom
Open
scsi/block: NUMA-local scan allocations, shared-tag path cleanup, and SCSI I/O counters#752blktests-ci[bot] wants to merge 3 commits intolinus-master_basefrom
blktests-ci[bot] wants to merge 3 commits intolinus-master_basefrom
Conversation
Author
|
Upstream branch: d60bc14 |
6b4d829 to
ceec5ed
Compare
Author
|
Upstream branch: b4e0758 |
421cf99 to
6e400f8
Compare
ceec5ed to
3b54e52
Compare
Author
|
Upstream branch: 6596a02 |
6e400f8 to
df480d9
Compare
3b54e52 to
6a0b974
Compare
Author
|
Upstream branch: 507bd4b |
df480d9 to
04a61e0
Compare
6a0b974 to
59ca59b
Compare
Author
|
Upstream branch: dd6c438 |
04a61e0 to
c5c14f9
Compare
94f0438 to
857ada9
Compare
…apter When a host adapter is attached to a specific NUMA node, allocating scsi_device and scsi_target via kzalloc() may place them on a remote node. All hot-path I/O accesses to these structures then cross the NUMA interconnect, adding latency and consuming inter-node bandwidth. Use kzalloc_node() with dev_to_node(shost->dma_dev) so allocations land on the same node as the HBA, reducing cross-node traffic and improving I/O performance on NUMA systems. Signed-off-by: James Rizzo <[email protected]> Signed-off-by: Sumit Saxena <[email protected]>
Original patch [1] by Bart Van Assche; this version is rebased onto the current tree. In testing it improves IOPS by roughly 16-18% by removing the fair-sharing throttle on shared tag queues. This patch removes the following code and structure members: - The function hctx_may_queue(). - blk_mq_hw_ctx.nr_active and request_queue.nr_active_requests_shared_tags and also all the code that modifies these two member variables. [1]: https://lore.kernel.org/linux-block/[email protected]/ Signed-off-by: Bart Van Assche <[email protected]> Signed-off-by: Sumit Saxena <[email protected]>
Author
|
Upstream branch: dd6c438 |
iorequest_cnt and iodone_cnt are updated on every command dispatch and completion, often from different CPUs on high queue depth workloads. Using adjacent atomic_t fields caused cache line contention between the submission and completion paths. Represent these statistics with struct percpu_counter so increments are mostly local to each CPU, avoiding false sharing without growing struct scsi_device further for cache-line padding. Suggested-by: Bart Van Assche <[email protected]> Signed-off-by: Sumit Saxena <[email protected]>
c5c14f9 to
41afac9
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull request for series with
subject: scsi/block: NUMA-local scan allocations, shared-tag path cleanup, and SCSI I/O counters
version: 2
url: https://patchwork.kernel.org/project/linux-block/list/?series=1083261